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Abstract 

We study percolation on small-world networks, which has been proposed as 
a simple model of the propagation of disease. The occupation probabilities of 
sites and bonds correspond to the susceptibility of individuals to the disease 
and the transmissibility of the disease respectively. We give an exact solution 
of the model for both site and bond percolation, including the position of the 
percolation transition at which epidemic behavior sets in, the values of the 
two critical exponents governing this transition, and the mean and variance 
of the distribution of cluster sizes (disease outbreaks) below the transition. 



In the late 1960s, Milgram performed a number of experiments which led him to conclude 
that, despite there being several billion human beings in the world, any two of them could 
be connected by only a short chain of intermediate acquaintances of typical length about 
six [|IJ. This result, known as the "small- world effect", has been confirmed by subsequent 
studies and is now widely believed to be correct, although opinions differ about whether six 
is an accurate estimate of typical chain length ||. 

The small-world effect can be easily understood in terms of random graphs || for which 
typical vertex-vertex distances increase only as the logarithm of the total number of vertices. 
However, random graphs are a poor representation of the structure of real social networks, 
which show a "clustering" effect in which there is an increased probability of two people 
being acquainted if they have another acquaintance in common. This clustering is absent 
in random graphs. Recently, Watts and Strogatz Q have proposed a new model of social 
networks which possesses both short vertex-vertex distances and a high degree of cluster- 
ing. In this model, sites are arranged on a one-dimensional lattice of size L, and each site 
is connected to its nearest neighbors up to some fixed range k. Then additional links — 
"shortcuts" — are added between randomly selected pairs of sites with probability per link 
on the underlying lattice, giving an average of <pkL shortcuts in total. The short-range 
connections produce the clustering effect while the long-range ones give average distances 
which increase logarithmically with system size, even for quite small values of (f>. 

This model, commonly referred to as the "small-world model," has attracted a great 
deal of attention from the physics community. A number of authors have looked at the 
distribution of path lengths in the model, including scaling forms and exact and mean- 
field results J8]|| , while others have looked at a variety of dynamical systems on small- world 
networks P ,|I0|, n| . A review of recent developments can be found in Ref. [jE 



One of the most important consequences of the small-world effect is in the propagation 
of disease. Clearly a disease can spread much faster through a network in which the typ- 
ical person-to-person distance is O(logL) than it can through one in which the distance 
is O(L). Epidemiology recognizes two basic parameters governing the effects of a disease: 



the susceptibility — the probability that an individual exposed to a disease will contract it— 
and the transmissibility — the probability that contact between an infected individual and a 
healthy but susceptible one will result in the latter contracting the disease. Newman and 
Watts U studied a model of disease in a small-world which incorporates these variables. In 
this model a randomly chosen fraction p of the sites or bonds in the small-world model are 
"occupied" to represent the effects of these two parameters. A disease outbreak which starts 
with a single individual can then spread only within a connected cluster of occupied sites or 
bonds. Thus the problem of disease spread maps onto a site or bond percolation problem. At 
some threshold value p c of the percolation probability, the system undergoes a percolation 
transition which corresponds to the onset of epidemic behavior for the disease in question. 
Newman and Watts gave an approximate solution for the position of this transition on a 
small-world network. 

In this paper, we give an exact solution for both site and bond percolation on small-world 
networks using a generating function method. Our method gives not only the exact position 
of the percolation threshold, but also the values of the two critical exponents governing 
behavior close to the transition, the complete distribution of the sizes of disease outbreaks 
for any value of p below the transition, and closed-form expressions for the mean and variance 
of the distribution. A calculation of the value of p c only, using a transfer-matrix method, 



has appeared previously as Ref. [13 



The basic idea behind our solution is to find the distribution of "local clusters" — clusters 
of occupied sites or bonds on the underlying lattice — and then calculate how the shortcuts 
join these local clusters together to form larger ones. We focus on the quantity P(n), which 
is the probability that a randomly chosen site belongs to a connected cluster of n sites. This 
is also the probability that a disease outbreak starting with a randomly chosen individual 
will affect n people. It is not the same as the distribution of cluster sizes for the percolation 
problem, since the probability of an outbreak starting in a cluster of size n increases with 
cluster size in proportion to n, all other things being equal. The cluster size distribution is 
therefore proportional to P(n)/n, and can be calculated easily from the results given in this 




Figure 1 Graphical representation of a cluster of connected sites. The entire 
cluster (circle) is equal to a single local cluster (square), with any number m > of 
complete clusters attached to it by a single shortcut. 

paper, although we will not do so. 

We start by examining the site percolation problem, which is the simpler case. Since P(n) 
is difficult to evaluate directly, we turn to a generating function method for its calculation. 
We define 

oo 

H(z) = J2P(n)z\ (f) 

n=0 

For all p < 1, as we show below, the distribution of local clusters falls off with cluster 
size exponentially, so that every shortcut leads to a different local cluster for L large: the 
probability of two shortcuts connecting the same pair of local clusters falls off as L~ l . This 
means that any complete cluster of sites consists of a local cluster with m > shortcuts 
leading from it to m other clusters. Thus H(z) satisfies the Dyson-equation-like iterative 
condition illustrated graphically in Fig. [I], and we can write it self-consistently as 

oo oo 

H{z) = £ P {n)z n £ P(m\n)[H(z)] m . (2) 

n=0 m=0 

fn this equation Po(n) is the probability of a randomly chosen site belonging to a local 
cluster of size n, which is 

{1 — p for n = 

(3) 
npq n l {\ — q) 2 for n > f, 

with q = 1 — (1 —p) k . P(m\n) is the probability of there being exactly m shortcuts emerging 
from a local cluster of size n. Since there are 2(pkL ends of shortcuts in the network, P(m\n) 
is given by the binomial 

4 



P(m\n) 



2<pkL 



m 



n 



1-^ 
L 



2(f>kL—m 



(4) 



Using this expression Eq. (H) becomes 



H(z) = ^P (n)z r - 

n=0 



1 + (H(z) - 1 



n 

z 



20fcL oo 

= £P (n)[ze 
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2fc</>(if(z)-l)lT» 



(5) 



for L large. The remaining sum over n can now be performed conveniently by defining 

H (z) = Y / Po(n)z n = 1-p + pzf 



(6) 



where the second equality holds in the limit of large L and we have made use of ([|). Hq(z) 
is the generating function for the local clusters. Now we notice that H(z) in Eq. (|5]) is equal 
to H (z) with z ae 2 *^*)- 1 ). Thus 



= i/nf^e 2 ^ 2 )- 1 )). 



(7) 



can be calculated directly by iteration of this equation starting with H(z) = 1 to 
give the complete distribution of sizes of epidemics in the model. It takes n steps of the 
iteration to calculate P(n) exactly. The first few steps give 



P(0) = l-p, 

P(l)=p(l-q) 2 e- 2k<t>p } 

P{2) = p(l - q) 2 [2q + 2k(j>p{\ - g) 2 ]e" 4 ^ p . 



(8) 
(9) 
(10) 



It is straightforward to verify that these are correct. We could also iterate Eq. (|7|) numerically 
and then estimate P(n) using, for instance, forward differences at z = 0. Unfortunately, like 
many calculations involving numerical derivatives, this method suffers from severe machine- 
precision problems which limit us to small values of n, on the order of n < 20. A much 
better technique is to evaluate H(z) around a contour in the complex plane and calculate 
the derivatives using the Cauchy integral formula: 



P(n) 



1 d n H 

n! dz n 



2 = 



H(z] 
2-k'iJ z n+1 



dz. 



(11) 




Figure 2 The distribution of outbreak sizes in simulations of the site percolation 
model with L = 10 7 , k = 5, 4> = 0.01, and p = 0.25, 0.30, 0.35, and p = p c = 0.40101 
(circles, squares, and up- and down-pointed triangles respectively). The solid lines 
are the same distributions calculated using Eqs. (|7|) and (|i~T|). Inset: The average size 
of disease outbreaks as a function of p for (left to right) = 10 _1 , 10~ 2 , 10" 3 , 10" 4 . 
The points are numerical results for L = 10 7 , k = 5 and the solid lines are the exact 
result, Eq. (0). 



A good choice of contour in the present case is the unit circle \z\ = 1. Using this method 
we have been able to calculate the first thousand derivatives of H(z) without difficulty. 

In Fig. H we show the distribution of outbreak sizes as a function of n calculated from 
Eq. (|TT1) for a variety of values of p. On the same plot we also show the distribution of 
outbreaks measured in computer simulations of the model on systems of L = 10 7 sites. As 
the figure shows, the agreement between the two is excellent. 

We can also calculate any moment of the distribution in closed form using Eq. (^). For 
example, the mean outbreak size is given by the first derivative of H: 

m - H'm - ^ - p{1 + q) a?) 

W [) l-2k(j>H' Q {l) l-q-2k<fyp[l + qy 1 ] 
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and the variance is given by 

(n 2 )-(n) 2 =H"{l) + H\l)-[H\l)}\ 

p[l + 3q- 3q 2 - q 3 - p(l - q)(l + qf + 2k(j)p 2 {l + qf] 



(13) 



[1-q- 2k(f)p(l + q)] 3 

In the inset of Fig. |2| we show Eq. (12) for various values of along with numerical results 



from simulations of the model, and the two are again in good agreement. 

The mean outbreak size diverges at the percolation threshold p = p c . This threshold 
marks the onset of epidemic behavior in the model M and occurs at the zero of the denom- 
inator of Eq. (IT3) . The value of p c is thus given by 

6 = 1 ~ 9c = (1 ~Pc) fc (U ) 
2kp c (l + q c ) 2kp c {2-{l-p c ) k Y 1 ] 

in agreement with Ref . . The value of p c calculated from this expression is shown in the 



left panel of Fig. ^| for three different values of k. 

The denominator of Eq. (|T2|) is analytic at p = p c and has a non-zero first derivative with 
respect to p, so that to leading order the divergence in (n) goes as (p c — p)~ l as we approach 
percolation. Defining a critical exponent a in the conventional fashion (n) ~ (p c — p)~ 1 / a , 
we then have 

a = l. (15) 

Near p c we expect P{n) to behave as 

P(n) ~ n' T e~ n/n ' as n -> oo. (16) 

It is straightforward to show that both the typical outbreak size n* and the exponent 
t are governed by the singularity of H(z) closest to the origin: n* = (logz*) -1 , where z* is 
the position of the singularity, and r = a + 1, where H(z) ~ (z* — z) a close to z*. 

In general, the singularity of interest may be either finite or not; the order of the lowest 
derivative of H (z) which diverges at z* depends on the value of a. In the present case, H(z*) 
is finite but the first derivative diverges, and we can use this to find z* and a. 



io" 4 io" 3 io" 2 10" 1 io" 3 io" 2 10"' 10° 

shortcut density <|) 

Figure 3 Numerical results for the percolation threshold as a function of shortcut 
density <p for systems of size L = 10 6 (points). Left panel: site percolation with 
k = 1 (circles), 2 (squares), and 5 (triangles). Right panel: bond percolation with 
k = 1 (circles) and 2 (squares). The solid lines are the analytic expressions for the 
same quantities, Eqs. (Ill]), (|25|) , and ( p6|) . 

Although we do not have a closed-form expression for H(z), it is simple to derive one 
for its functional inverse H- 1 {w). Putting H{ z) — > w and z — > H 1 (w) in Eq. (^) and 
rearranging we find 

H-\w) = H \w) e 2k ^- w \ (17) 

The singularity in H(z) corresponds to the point w* at which the derivative of i/ -1 (u>) is 
zero, which gives 2k(j)z*H' Q (z*) = 1, making z* = e 1 /™* a real root of the cubic equation 

(1 - qzf - 2k(ppz(l - qf(l + qz) = 0. (18) 

The second derivative of if _1 (w) is non-zero at w*, which implies that H(z) ~ (z* — z) 1 / 2 
and hence a = | and the outbreak size exponent is 
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r=i (19) 

A power-law fit to the simulation data for P(n) shown in Fig. ^| gives r = 1.501 ± 0.001 in 
good agreement with this result. 

The values o = 1 and r = | put the small-world percolation problem in the same 
universality class as percolation on a random graph | [15|] , which seems reasonable since the 



effective dimension of the small-world model in the limit of large system size is infinite || 
just as it is for a random graph. 

We close our analysis of the site percolation problem by noting that Eq. (|7]) is similar 
in structure to the equation H(z) = ze H( - z ^ for the generating function of the set of rooted, 
labeled trees. This leads us to conjecture that it may be possible to find a closed-form 
expression for the coefficients of the generating function H(z) using the Lagrange inversion 



formula fL6 |. 



Turning to bond percolation, we can apply the same formalism as above with only two 
modifications. First, the probability Po(n) that a site belongs to a local cluster of size n is 
different for bond percolation and consequently so is H (z) (Eq. @). For the case k = 1 

P (n) = np n ~ 1 (l-p) 2 , (20) 

where p is now the bond occupation probability. This expression is the same as Eq. (^) for 
the site percolation case except that -Po(0) is now zero and Po{n > 1) contains one less factor 
of p. Hq(z) for k — 1 is 

(1 - V? 

For k > 1, calculating Po(n) is considerably more complex, and in fact it is not clear whether 
a closed-form solution exists. However, it is possible to write down the form of Ho(z) directly 
using the method given in Ref. Jl3|]. For k = 2, for instance, 



z(l — p) 4 (l — 2pz + p 3 (l — z)z + p 2 z 2 

Po{z) _ 



1 - Apz + p 5 (2 - 3z)z 2 - p 6 (l - z)z 2 + p 4 ^ 2 (l + 3z) + p 2 z(A + 3z) - p 3 z(l + 5z + z 2 ) 

(22) 
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The second modification to the method is that in order to connect two local clusters a 
shortcut now must not only be present (which happens with probability <fi) but must also 
be occupied (which happens with probability p). This means that every former occurrence 
of (f) is replaced with <f>p. The rest of the analysis follows through as before and we find that 
H(z) satisfies the recurrence relation 

H{z) = H (ze 2k ^ H ^-V), (23) 

with Hq as above. Thus, for example, the mean outbreak size is now 

(n) = H'(l) = H ^ — . (24) 

and the percolation transition occurs at 2k(j)pH' {l) = 1, which gives 

j. 1-Pc 



2p c (l+p c ) 

for k — 1 and 

(l- Pc ) 3 (l- Pc +p2) 



(25) 



(26) 



Ap c (l + 3pl - 3p3 - 2 pi + hpl - 2pl) 

for k = 2. As in the site percolation case, the critical exponents are a — 1 and r — |. In the 
right panel of Fig. ||] we show curves of p c as a function of <fi for the bond percolation model 
for k = 1 and k = 2, along with numerical results for the same quantities. The agreement 
between the exact solution and the simulation results is good. 

We can also apply our method to the case of simultaneous site and bond percolation, 
by replacing Pq{ti) with the appropriate distribution of local cluster sizes and making the 
replacement — > 0pt>ond as above. The developments are simple for the case k = 1 but the 
combinatorics become tedious for larger k and so we leave these calculations to the interested 
(and ambitious) reader. 

To conclude, we have studied the site and bond percolation problems in the Watts- 
Strogatz small-world model as a simple model of the spread of disease. Using a generating 
function method we have calculated exactly the position of the percolation transition at 
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which epidemics first appear, the values of the two critical exponents describing this transi- 
tion, and the sizes of disease outbreaks below the transition. We have confirmed our results 
with extensive computer simulations of disease spread in small-world networks. 

Finally, we would like to point out that the method described here can in principle be 
extended to small- world networks built on underlying lattices of higher dimensions [@||. 
Only the generating function for the local clusters H (z) needs to be recalculated, although 
this is no trivial task; such a calculation for a square lattice with k = 1 would be equivalent 
to a solution of the normal percolation problem on such a lattice, something which has not 
yet been achieved. Even without a knowledge of Hq(z), however, it is possible to deduce 
some results. For example, we believe that the critical exponents will take the values a = 1 
and r = |, just as in the one-dimensional case, for the exact same reasons. It would be 
possible to test this conjecture numerically. 

The authors are grateful to Michael Renardy for pointing out Eq. (|T7|), and to Keith Briggs, 
Noam Elkies, Philippe Flajolet, and David Rusin for useful comments. This work was supported 
in part by the Santa Fe Institute and DARPA under grant number ONR N00014-95-1-0975. 
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